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Abstract 

Seemingly unrelated regressions are statistical regression models based on the Gaus- 
sian distribution. They are popular in econometrics but also arise in graphical 
modeling of multivariate dependencies. In maximum likelihood estimation, the pa- 
rameters of the model are estimated by maximizing the likelihood function, which 
maps the parameters to the likelihood of observing the given data. By transforming 
this optimization problem into a polynomial optimization problem, it was recently 
shown that the likelihood function of a simple bivariate seemingly unrelated regres- 
sions model may have several stationary points. Thus local maxima may complicate 
maximum likelihood estimation. In this paper, we study several more complicated 
seemingly unrelated regression models, and show how all stationary points of the 
likelihood function can be computed using algebraic geometry. 
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1 Introduction 



Seemingly unrelated regressions (SUR) are multivariate regression models with 
correlated response (or dependent) variables that follow a joint Gaussian dis- 
tribution. Usually different regressions contain different covariates (or inde- 
pendent variables) and seem "unrelated." However, due to the correlated re- 
sponse variables the regressions are onl y "seern i ngly u nrelated" and contain 



valuable information about each ot her (IZellnei . 119621) . SUR play "a central 



role in contempora ry econo r aetrics " (iGoldbergei 



1991L p. 323) but also appear 



in other contexts ( Rochon , Il996al lb[ Verbvla and Venablesl . 19881 ) . Moreover 



^ Supported by the University of Washington Royalty Research Fund Grant No. 
65-3010. 



Preprint submitted to Elsevier Science 



2 February 2008 



SU R arise in the context of Gaussi an graphical models ()Andersson et al 
§5; iRichardson and Spirte 2. bnni §8.5). 



2001 



The parameters of a SUR model can be estimated efficiently, i.e. with small 
variance, by maximizing the likelihood function, w hich maps the parame- 
ters t o th e likelihood of observing the given data. lOberhofer and Kmenta 
( 1974 ) and lTelseii (|l96J) give two popular algorithms for this maximization. In 
general, however, these algorithms will not globally maximize the likelihood 
funct ion, which indeed may be m ultir nodal; a fact neglected in th e litera- 



ture ( Drton and Richardson . 20041 !)6). lDrton and Richardson (2004) demon- 
strated the possibility of multimodality in a study of a bivariate SUR model 
that may have a likelihood function with five stationar y points. In this paper, 
we us e algebraic geometry to apply the approach of iDrton and Richardson 
( 2004^ to more general SUR models. In Sections 2 and 3 we give an intro- 
duction to SUR and show how maximum likelihood estimation can be per- 
formed by solving a polynomial optimization problem, opening the door for 
tools from algebraic geometr y. With these tools, we first revisit the work by 
Drton and Richardson ( 20041 ). see Section 4, and then obtain new results on 
more general SUR models (Section 5). In particular, we identify examples of 
SUR models, for which all stationary points of the likelihood function can be 
computed. 



2 Seemingly unrelated regressions 



In SUR a family of response variables, indexed by a finite set R, is stochas- 
tically modeled using a family of covariates, indexed by a finite set C. All 
response variables and all covariates are observed on a finite set of subjects 
A^. We denote the cardinalities of the three sets also by R, C and A^, respec- 
tively. The observations can be represented by two matrices X and Y . The 
matrix Y = {Yrm) e R^""^ has the (r, m)-entry equal to the observation of re- 
sponse variable r G i? on subject m & N, and the matrix X = (Xcm) G M*^^^ 
has the (c, m)-entry equal to the observation of covariate c G C on subject 
m e N. For c G C and r e R, Xc eR^ and Yr G denote the c-th and r-th 
row of X and Y, respectively. Similarly, X"^ and Y"^, m & N, denote the m-th 
column of X and Y, respectively. Clearly, Xc and Yr comprise all observations 
of the c-th covariate and the r-th response variable; X"^ and comprise all 
covariate and response variable observations on the m-th subject. 

In this regression setting, the matrix X is assumed to be deterministic and 
fixed but the matrix Y is modeled to follow a multivariate normal distribution, 
where the mean vector of F^, r G i?, is a linear combination of some X^, 
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1) 



ceCrQC, 

E[Yr] = PrcXc e M^, re R. 

ceCr 

Here (C^ | r G i?) is a fixed family of subsets of C indexing the covariates 
involved in each one of the R regressions. The weights Pre in (1) ^-re called 

regression coefficients. Setting fSrc = if c ^ C,., we can define a matrix of 



regression coefficients B = [[3r 



oRxC 



The random vectors Y™-, m E N, are 



assumed to be independent with common positive definite covariance matrix 

Var[r"^] = E e M^^^, m E N. (2) 

Letting 

Ct^ = U({r} X Cr\r e R) C Rx C, (3) 
the seemingly unrelated regressions model is the family of normal distributions 



(4) 

Here N'rxN is the multivariate normal distribution on M.^^^; Jjv is the N x N 
identity matrix; (E) is the Kroncckcr product; B and S arc the mean and 
the variance parameters; and the parameter space M{Cti) x P is the Cartesian 
product of the linear space 



nRxC 



B={prc), Prc = 0^{r,c)^Cn} 



(5) 



and the cone P of all positive definite real RxR matrices. The response matrix 
Y is then an observation from some (unknown) distribution in the model, 

Y ^N'{BX,^®In), {B,E) eM{Cn) xF. 

If N > R + C and X is a matrix of full rank, then with probability one the 
{R + C) X N matrix obtained by stacking X and Y has full rank. 



rank 



^R + C. 



(6) 



We assume (6) to hold throughout the paper. 



3 Mciximum likelihood estimation by polynomial optimization 



The probability density function f(B,T,) '■ ^^^'^ 
J\f{BX, E (g) /„) can be written as 



(0, oo) of the distribution 



^(27r)«^|E|^ 



exp 



:tr 



j:~\y - BX){Y - Bxy 
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For data Y, the likelihood function L : B(C7^) x P ^ (0, oo) of the model 
N(C7^) is defined as 

L(S,S) = /(B,E)(V'). 
In maximum hkehhood estimation the parameters {B, S) are estimated by 

= argmax{L(S,S) | (S, S) G M{Cn) x P}. (7) 

It follows from (6) that the maximum of the likelihood function exists. 

We can parameterize M{Cti) by mapping a vector 

(3= {Pre I {r,c) eCn) eR'^^, 

to the matrix B{P) G M{Cti) with entry B{f3)rc = Pre if ('"i c) G Ct^ and 
B{P)rc = otherwise. Define £ : M^'^ x P ^ M by 

£(/3,S)=logL(i?(/9),S) 

cx -y log |E| - ^tr [S-I (r - B{P)X) (y - S(/?)x)'] . 

Clearly we can solve (7) by finding 

0, t) = argmax{£(/?, S) | E) G M'^^ x P}, (9) 

and setting B = B{j3). The standard approach to solve (9) is to solve the 
likelihood equations 

( di{p,E) di{p,j:) \_ 

[ dp ' di: ) ^ ^ 

It can be shown that (10) holds if and only if 

s = l(y - Bip)x) (y - Bip)x)' (11) 

and 

P= [A'{XX' 0E-^)aY^ A'vec{J:~^YX'), (12) 

where A is a matrix of zeroes and ones that satisfies vec{B{P)) = Ap. In 
fact, each column of A has precisely one entry equa l to one and the remaining 
entries equal to zero. lOberhofer and Kmental (|l974^ show how one solution to 
the likelihood equations can be obtained by alternating between solving (11) 
for fixed P and solving (12) for fixed S. Here, we take a different approach that, 
for certain SUR models, allows us to compute all solutions to the likelihood 
equations. 

From (6) and (8), it follows that for fixed P G Mp'^ the function £^ : S i— 
£{P, S) is strictly concave with maximizer (11). Thus the profile log-likelihood 
function iprof '■ R'^^ R defined as 

^p,of(/?) = max{£(/3,S)|SGP} (13) 
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takes on the form 



Vof(/^) oc -y log - B{P)X) (y - BiP)X)' 



RN 



(14) 



By the strict con-cavity of ip, (/?, S) is a stationary point of £( 0,12) if and only 

i f is a stationary point of ^prof (/?) and S satisfies (11); compare IPrton and Richardson 
( 20041 Lemma 1). The same holds for 



G{f3) 



{Y-Bif3)x){Y-BiP)x)' 



(15) 



which conveniently is a polynomial in 0. Thus we can solve (9) by using (11) 
and solving the unconstrained polynomial program 



/3 = argmin{G'(/?) | P e 



(16) 



We try to solve (16) by computing the stationary points of G, i.e. by solving 
the equations 

dG{0) 



9r 



0, {r,c)eCn. 



(17) 



In practice the observations Y and X are available only in finite accuracy 
and the partial derivatives grc, {r, c) G C-ji, are elements of the ring Q[(3] of 
polynomials in with rational coefficieri t s. In an algebraic appro ach to solving 
polynomial equations ( Cox et all . Il997 . 19981 : Sturmfeli . l2002| ) we allow the 
indeterminants in the polynomial equation system (17) to be complex, i.e. 
f3 e C''^, where C is the field of complex numbers. We define the maximum 
likelihood ideal Iq to be the ideal of Q[(3] that is generated by the partial 
derivatives grc, {r, c) G Cti, 



I.e. 



= {grc I (r,c) e Cn); 



compare ISturmfelsl (|2002L §8.4) who defines maximum likelihood ideals in a dif- 
ferent statistical context. Software like Macaulay 2 ^ and Singular (ICreuel et al, 
200lh permits us to check whether Iq is a zero-dimensional ideal. If dim(JG') = 
0, then the variety Vc{Ig), i-e. the set of common complex zeroes of the partial 
derivatives Qrc, is a finite set and all its elements can be computed using, for 
example. Singular or also PHCpack^ . The real points Vk{Ig) = Vc{Ig) H 
can then be identified and yield the stationary points of G. 



http : //www . math . uiuc . edu/Macaulay2/ 
http://www.iiiath.uic . edu/~ jan/j 
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> ring R=0, (b(l. .2)) , Ip; 

> matrix X[2][8] = 188,22,-46,77,-103,74,83,101, 

55,-216,116,-30,131,195,-311,-239; 

> matrix Y[2] [8] = 234,-5,6,182,-193,278,62,-68, 

497 , -326 , 266 , -3 , 93 , 558 , -584 , -224 ; 

> matrix B[2][2] = b(l),0, 0,b(2); 

> poly G = det((Y-B*X)*transpose(Y-B*X)) ; 

> ideal IG =jacob(G) ; 

> ideal J = groebner(IG) ; 

> dim(J) ; vdim(J) ; 
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> LIB "solve. lib"; solve(J,6); 
[1] : 

[1]: 0.778796 
[2]: 1.538029 
[2] : 

[1]: 1.622609 
[2] : 2 . 034745 
[3] : 

[1]: (1.480687-1*1.547274) 
[2] : (2 . 16845+1*0 . 765283) 
[4] : 

[1]: (1.480687+1*1.547274) 
[2] : (2 . 16845-1*0 . 765283) 
[5] : 

[1]: 2.764418 
[2]: 2.504006 

Table 1 

Singular-session for the model in lDrton and Eichar d^ (j2nn4l v 



4 Revisiting the multimodal bivariate seemingly unrelated regres- 
sions with two covariates 



Drton and RichardsonI (j200J) study a SUR model with two response variables 
and two covariates, in which response variable 1 is regressed only on covariate 
1, and response variable 2 only on covariate 2. Hence, R = {1, 2}, C = {1, 2}, 
Ci = {1}, and C2 = {2}. Therefore, Cn = {(1, 1), (2, 2)}, and B e M{Cn) if B 
is of the form 



B 




1)2x2 



Using Singular and the data in iDrton and RichardsonI (|2004l Table 1), we 
can solve (16) as shown in Table 1. 
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As computed by dim and vdim, the maximum hkehhood ideal Ig = IG is 
zero-dimensional and of degree five. The five points in the variety Vc{Ig) are 
computed by solve, which lists /^n = b(l) as first component and /322 = 
b(2) as second component. There are three real points in V^{Ig), which yield 
the stationary points of the likelihoo d function of the mod el NfC -T?.). Note 
that we confirm the values stated in Drton and RichardsonI ^2Q04 . Table 2 
with = /3i and /322 = [^2)- The Grobner basis computed by the command 
groebner(IG) has two elements that are (i) a quintic in j322 = b(2) and (ii) 
a sum of a linear function in = b(l) and a quartic in (322 = b(2). Thus it 
follows imme diately that the stationary po ints of G can be found from solving 
a quintic (cf. IPrton and RichardsonI . 120041 . Thm. 2). 



5 Dimensions and degrees of maximum likelihood ideals 



5.1 Seemingly unrelated regressions 



The algebraic approach can also be applied to more general models. Here we 
focus on SUR models N(C7^) for which (C^ \ r E R) consists of disjoint sets; 
in other models inclusio i i rela. tions among the sets Cr may be exploited (cf. 
Andersson and Perlmanl . Il994| ). More precisely, we consider models T^{Cti) in 



which ri < r2, ri,r2 e R, implies that Ci < C2 for all ci e C^.^ and C2 G Cr2- 
Then M{Cti) is a linear space of block- diagonal matrices. 

Table 2 states the dimension and degree of the maximum likelihood ideal for 
seven examples including the one from Section 4. For the models with zero- 
dimensional maximum likelihood ideal Iq, we can find all stationary points of 
the likelihood function by computations analogous to the ones demonstrated 
in Table 1. The likelihood functions of these models may be multimodal and 
it would be interesting to find, for each model, reference data for which the 
cardinality of Vr(/g) is large. For example, let C-ji = {(1, 1), (2, 2)} and choose 



X 



Y 



0.65 


-0.80 


1.34 


-1.03 


-1.08 


0.04 


-1.18 


1.98 


-2.42 


-3.75 


0.14 


-0.73 


1.40 


-2.29 


-3.30 


0.52 


-1.93 


3.02 


-6.67 


-9.94 



(19) 



then the variety of the maximum likelihood ideal of N(C7^) is purely real, i.e. 
Mr(-^g) = Vc{Ig) = 5. Figure 1 shows a three-dimensional plot and a contour 
plot of the profile log-likelihood function for these observations. We conjecture 
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Cn 



M{Cn) 



dim(/(3) degree(/G) 



{(1,1),(2,2)} 
{(1,1), (1,2), (2, 3)} 

{(1,1), (2, 2), (3, 3)} 
{(1,1),(1,2),(1,3),(2,4)} 



{ Pii \ 

\ 1322 J 

( Pii Prz \ 
\ 1323 J 



I3ii 
f322 
1333 



/3ii /3i2 /3i3 
/324 



29 



{(1,1),(1,2),(2,3),(2,4)} 



{(1,1), (2, 2), (3, 3), (4, 4)} 



{(1,1), (2, 2), (3, 3), (4, 4), (5, 5)} 



/3ii /3i2 

P23 P24. 

' 

^22 

fts 

. ^44, 




/322 
fyi 
/344 
/355. 



32 



80 



Table 2 

Dimension and degree of maximum likelihood ideals. 




1 beta1 



-2 -3 




Fig. 1. Three-dimensional plot and contour plot of profile log-likelihood function. 
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C-ji Subspace of B(C7e,) dim(/(5) degree(JG) 

{(1,1), (2, 2)} [%^;^^) 3 

{(1,1), (1,2), (2, 3)} 7 

//3ii \ 

{(1,1), (2, 2), (3, 3)} /3n 11 

\ P33 J 

{(1,1),(1,2),(1,3),(2,4)} (^^^^^^^^%°3) 11 

{(1,1), (1,2), (2, 3), (2, 4)} [^'^^'^IJ,)) 23 

{(1,1), (2, 2), (3, 3), (4, 4)} fT "^S°3 ) 63 

\ /344/ 

Table 3 

Dimension and degree of maximum likelihood ideals of submodels. 

that data with V^ilc) = Vc(/g) exist for all three models in Table 2 that 
have zero- dimensional maximum likelihood ideal. For models with maximum 
likelihood ideal of dimension one or higher, it is not clear whether \r{Ig) = oo, 
i.e. a likelihood function with an infinite number of stationary points, can occur 
with non-zero probability. 



5.2 Submodels of seemingly unrelated regressions 



It is obvious that the algebraic approach developed in Section 3 immediately 
carries over to the submodels of SUR that are of interest in testing equality 
of regression coefficients. In the model N(C7^) with C-ji — {(1, 1), (2, 2)}, for 
example, we may be interested in testing whether = f322- If this is done 
using a likelihood ratio test, then the likelihood function of the submodel in 
which Pii = /?22 is imposed has to be maximized. More precisely, the submodel 
has the restricted parameter space 

{B e M{Cn) I Pii = M X P. (20) 

Table 3 lists similarly obtained submodels of the models in Table 2, for which 
the maximum likelihood ideal is zero-dimensional and the variety Vc{Ig) can 
be computed. 

It should also be noted that submodels of SUR need not inherit unimodal 
likelihood functions from their parent model. For example, the bivariate SUR 
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model N(C7^) with C-jz = {(1, 1), (2, 1), (2, 2)} is monotone, i.e. the family 
{Cr I r G -R) is totally ordered by inclusion, which guarantees that the likeli- 
hood funct ion has precisely one sta tionary point correspondin g to the global 
maximum (jAndersson and Perlman . .1994i Drton et al. . 20031 ). However, the 
submodel induced by the restriction = (52i can be reexpressed in the form 
of the model studied in Section 4 by means of the linear transformation that 
changes response Y2 into Y2 — Y1. Hence, the submodel does not always have 
a unimodal likelihood function. 



6 Conclusion 



The presented algebraic approach to maximum likelihood estimation in SUR 
permits us to compute all stationary points of the likelihood function if the 
maximum likelihood ideal is zero-dimensional. This is the case for three seem- 
ingly unrelated regressions models considered in this paper (cf. Table 2): 
(i) the previously studied model based on Ct^,*-^^ = {(1, 1), (2, 2)}, (ii) the 
model with Cn^"^^ = {(1, 1), (1, 2), (2, 3)}, and (iii) the model with Cn^^'^ = 
{(1, 1), (2, 2), (3, 3)}. Additionally, interesting submodels of SUR may have a 
zero-dimensional maximum likelihood ideal (cf. Table 3). The computations 
in Singular that find all stationary points of the likelihood functions of the 
models with zero- dimensional maximum likelihood ideal are instantaneous for 
all but the model in Table 3 that has a maximum likelihood ideal of degree 
63. Thus we advocate the use of Singular or similarly capable software in 
statistical data analysis. 



In future work it would be interesting to find reference data sets leading to 
likelihood functions with a large number of stationary points. Moreover, the 
algebraic appr oach presented herein could be combined with re gression ap- 



proaches (e.g. Andersson and Perlman . 1994t Drton et al. . 2003| ) in order to 



identify larger classes of SUR models for which all stationary points of the 
likelihood function can be computed. Finally, i t could be explored whether 
methods for global minimization of polynomials ( Parrilo and Sturmfelsl 20031 ) 
can be used to find the global maximum of SUR likelihood functions. 
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